Update the course material for this week

To update the course repository on your computer, you will pull a current copy of the repository. To do this:

  1. Open your terminal/bash.

  2. Navigate to the course repository. If this is in your root directory then type:

    cd
    cd EEMB595TE
  3. Paste the following into the terminal/bash:

    git pull


Last weeks objectives

  1. Learn the differences between types of mark-recapture models

  2. Learn to analyze mark-recapture data


This weeks objectives

  1. Learn the the use and applications of occupancy and N-mixture models

  2. Learn to extrend these models to multi-season frameworks


Goals

To correct for imperfect species/individual detection probability without uniquely marking individuals (like we learned last week)

How do you do this?

We correct for imperct species/individaul detection by repeatedly sampling the site unit.

Repeatedly sampling the site- known as repeated surveys:

  • Can be repeated temporal surveys

  • Assuming two sites nearby have the same probability of occupancy and detection (spatial)

Why should we seperate the ecological and observations parts of the data?

  • The extent of the species distribution will be underestimated when the detection probability is < 1

  • If a species is not found at all sites where it occures, then the perceived range will be smaller than the actual range.

  • Estimates of covariate relationships will be biased towards zero when detection probability is < 1

  • Shown via simulations

  • Factors that affect the difficulty with which a species is found may end up in a predictive models of species occurrence or may mask factors that do affect species occurence

  • Confounding the ecological and observational process (i.e., habitat A = more open, habitat B = more wooded)

Important definitions

Occupancy probability: The probability that a site is occupied (i.e., that the abundance is greater than zero)

  • The realization of this probabilty gives rise to the proportion of sites occupied (i.e., filled)

Detection probability: The probability that you will encounter an individual, given that it is present

Example

We want to know the distribution of this ant:

in the Southeast United States:

Data collection design

We sample 1000 sites for the ants and collect: detection/non-detection data

?? Why didn’t I call this presence/absence data??

At each site, we go looking for ants at the same site for two consecutive days.

Simple occupancy model

\(y_{i,j}\) = observed data = detection/non-detection of the species at site i during survey j

  • \(y_{i,j}\) = 0 (species not detected at site i during survey j)

  • \(y_{i,j}\) = 1 (species detected at site i during survey j)

\(z_{i}\) = latent state variable = presence/absence of the species at site i

  • \(z_{i}\) = 0 (species absent at site i)

  • \(z_{i}\) = 1 (species present at site i)

The model:

  1. Ecological process

    \(z_{i}\) \(\sim\) Bernoulli(\(\psi\))

    • \(\psi\) = occupancy probability

    • To make \(\psi_{i}\) a function of a covariate, we use:

      logit(\(\psi_{i}\)) = \(\alpha\) + \(\beta\) \(X_{i}\)

      • ?? Why is \(\psi\) indexed by i here??
    • Notice you can only add site level covariates to z

  2. Observation process

    \(y_{i,j}\) \(\sim\) Bernoulli(\(z_{i}\) p)

    • Note that p is conditional on \(z_{i}\) by multiplying p and z

    • ?? How would you make p a function of a covariate? And what would you need to change from the expression above?

    • Notice you can add site or survey level covariates to y


Model assumptions

Site-occupancy model assumptions (MacKenzie et al. 2002):

  1. The occupancy status of a site does not change over the course of sampling (i.e., closed population)

  2. The probability of occupancy is the same for all sites or any heterogeneity is related to covariates

  3. The probability of detection is the same across all sites and observations or any heterogeneity is related to covariates

  4. Detection histories at each location are independent

Model written in JAGS code

{
sink("model.txt")
cat("
model 
{
# Priors

psi ~ dunif(0, 1)
p ~ dunif(0, 1)

# (1) Ecological model

for(i in 1:nSites){

  z[i] ~ dbern(psi)

  for(j in 1:nSurveys){

    y[i,j] ~ dbern(p.eff[i,j])

    p.eff[i,j] <-  z[i] * p

  } # j

} # i

}
",fill = TRUE)
sink()
}

Bundle the data

dat <- read.csv("~/EEMB595TE/7_Occupancy/ant_data.csv")[,-1]

jags.data <- list( y = dat,
                   nSites = nrow(dat),
                   nSurveys = ncol(dat))

zinit <- apply(dat, 1, max)

inits <- function(){list(psi = runif(1, 0, 1),
                         p = runif(1, 0, 1),
                         z = zinit
)}

#------- Parameters monitored

params <- c("psi", 
            "p")

#------- MCMC settings
ni <- 10000
nb <- 1000
nt <- 10
nc <- 3
na <- 10000

#------ call Library
library("jagsUI")
## Loading required package: lattice
## 
## Attaching package: 'jagsUI'
## The following object is masked from 'package:utils':
## 
##     View
#------- Call JAGS from R

out <- jags(data = jags.data, inits = inits, parameters.to.save = params, model.file = "model.txt", n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, n.adapt = na, parallel = TRUE)
## 
## Processing function input....... 
## 
## Converting data frame 'y' to matrix.
## 
## Done. 
##  
## Beginning parallel processing using 3 cores. Console output will be suppressed.
## 
## Parallel processing completed.
## 
## Calculating statistics....... 
## 
## Done.
print(out, dig = 3)
## JAGS output for model 'model.txt', generated by jagsUI.
## Estimates based on 3 chains of 10000 iterations,
## adaptation = 10000 iterations (sufficient),
## burn-in = 1000 iterations and thin rate = 10,
## yielding 2700 total samples from the joint posterior. 
## MCMC ran in parallel for 0.565 minutes at time 2018-05-17 09:38:18.
## 
##              mean     sd     2.5%      50%    97.5% overlap0 f  Rhat n.eff
## psi         0.669  0.043    0.591    0.666    0.760    FALSE 1 1.002  1181
## p           0.267  0.019    0.231    0.267    0.304    FALSE 1 1.000  2700
## deviance 2325.268 74.022 2186.986 2321.648 2479.102    FALSE 1 1.002  1053
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 2736.5 and DIC = 5061.739 
## DIC is an estimate of expected predictive error (lower is better).

Check the chains

plot(out)

Check the parameters from the model with the truth

The data set we used was actually simulated! We know the truth!

We can add covariates.

We can add spatial autocorrelation (see Yackulic et al. 2012).

We can add multiple states (see Bailey et al. 2014).

We can add multiple seasons.

  • The principles we learned about last week about transition matrices can come into play here.

N-mixture model

N-mixture model is used in the case where we have abundance of animals at sites that are repeatedly surveys (similar to occupancy model; Royle, 2004).

Also known as the Royle model and other names.

The model:

\(x_{i,j}\) = observed data = abundance of the species at site i during survey j

\(N_{i}\) = latent state variable = true abundance of the species at site i

The model:

  1. Ecological process

    \(N_{i}\) \(\sim\) Poisson(\(\lambda\))

    • \(\lambda\) = average number of animals across the site

    • Notice you can only add site level covariates to z

  2. Observation process

    \(x_{i,j}\) \(\sim\) Binomial(\(N_{i}\), p)

    • ?? Why are we using a Binomial distribution here instead of a Bernoulli??

Model Assumptions

N-mixture model assumptions (Royle, 2004):

  1. The ecological state (e.g., abundance) is constant during the period over which replicate surveys are conducted (closure assumption)

  2. Detection probability is constant for all individuals present during the survey

  3. The distributions of abundance and detection are adequately described by the chosen parametric forms (e.g., Poisson, binomial)

  4. There are no false positives; such as double counts


Write the model in JAGS code

{
sink("model.txt")
cat("
model 
{
# Priors

lambda ~ dunif(0, 20)
p ~ dunif(0, 1)

# (1) Ecological model

for(i in 1:nSites){

  N[i] ~ dpois(lambda)

  for(j in 1:nSurveys){

    x[i,j] ~ dbin(p, N[i])

  } # j

} # i

}
",fill = TRUE)
sink()
}

Bundle the data

dat <- read.csv("~/EEMB595TE/7_Occupancy/ant_data_abund.csv")[,-1]

jags.data <- list( x = dat,
                   nSites = nrow(dat),
                   nSurveys = ncol(dat))

Ninit <- apply(dat, 1, max)

inits <- function(){list(lambda = runif(1, 0, 10),
                         p = runif(1, 0, 1),
                         N = Ninit
)}

#------- Parameters monitored

params <- c("lambda", 
            "p")

#------- MCMC settings
ni <- 10000
nb <- 1000
nt <- 10
nc <- 3
na <- 10000

#------ call Library
library("jagsUI")

#------- Call JAGS from R

out <- jags(data = jags.data, inits = inits, parameters.to.save = params, model.file = "model.txt", n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, n.adapt = na, parallel = TRUE)
## 
## Processing function input....... 
## 
## Converting data frame 'x' to matrix.
## 
## Done. 
##  
## Beginning parallel processing using 3 cores. Console output will be suppressed.
## 
## Parallel processing completed.
## 
## Calculating statistics....... 
## 
## Done.
print(out, dig = 3)
## JAGS output for model 'model.txt', generated by jagsUI.
## Estimates based on 3 chains of 10000 iterations,
## adaptation = 10000 iterations (sufficient),
## burn-in = 1000 iterations and thin rate = 10,
## yielding 2700 total samples from the joint posterior. 
## MCMC ran in parallel for 1.493 minutes at time 2018-05-17 09:38:54.
## 
##              mean     sd     2.5%      50%    97.5% overlap0 f  Rhat n.eff
## lambda      3.501  0.185    3.174    3.488    3.879    FALSE 1 1.003   611
## p           0.336  0.017    0.303    0.336    0.369    FALSE 1 1.003   566
## deviance 7022.033 68.484 6889.266 7022.100 7158.912    FALSE 1 1.002  1021
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 2342.2 and DIC = 9364.217 
## DIC is an estimate of expected predictive error (lower is better).

Check the chains

plot(out)

Multi-season occupancy model

This is the multi-season extension of the occupancy model.

Assume periods of open population dynamics between periods of closed population dynamics (when you do the surveys).

There is code in the folder to run this model.

You estimate:

  • True occupancy per season

  • Extinction rate for habitats

  • Colonization rates for habitats

  • Species detection probability


Generalized N-mixture model

This is the multi-season extension of the N-mixture model.

Assume periods of open population dynamics between periods of closed population dynamics (when you do the surveys).

There is code in the folder to run this model.

You estimate:

  • True average abundance

  • Apparent survival probability (survival + emigration)

  • Recruitment rate (immigration + births)

  • Individual detection probability